Phylogenomic clustering for selecting non-redundant genomes for comparative genomics

نویسندگان

  • Gabriel Moreno-Hagelsieb
  • Zilin Wang
  • Stephanie Walsh
  • Aisha ElSherbiny
چکیده

MOTIVATION Analyses in comparative genomics often require non-redundant genome datasets. Eliminating redundancy is not as simple as keeping one strain for each named species because genomes might be redundant at a higher taxonomic level than that of species for some analyses; some strains with different species names can be as similar as most strains sharing a species name, whereas some strains sharing a species name can be so different that they should be put into different groups; and some genomes lack a species name. RESULTS We have implemented a method and Web server that clusters a genome dataset into groups of redundant genomes at different thresholds based on a few phylogenomic distance measures. AVAILABILITY The Web interface, similarity and distance data and R-scripts can be accessed at http://microbiome.wlu.ca/research/redundancy/.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

GreenPhylDB: a database for plant comparative genomics

GreenPhylDB (http://greenphyl.cirad.fr) is a comprehensive platform designed to facilitate comparative functional genomics in Oryza sativa and Arabidopsis thaliana genomes. The main functions of GreenPhylDB are to assign O. sativa and A. thaliana sequences to gene families using a semi-automatic clustering procedure and to create 'orthologous' groups using a phylogenomic approach. To date, Gree...

متن کامل

Comparative genomics of Synechococcus and proposal of the new genus Parasynechococcus

Synechococcus is among the most important contributors to global primary productivity. The genomes of several strains of this taxon have been previously sequenced in an effort to understand the physiology and ecology of these highly diverse microorganisms. Here we present a comparative study of Synechococcus genomes. For that end, we developed GenTaxo, a program written in Perl to perform genom...

متن کامل

Phylogenomics and comparative genomics of Lactobacillus salivarius, a mammalian gut commensal

The genus Lactobacillus is a diverse group with a combined species count of over 200. They are the largest group within the lactic acid bacteria and one of the most important bacterial groups involved in food microbiology and human nutrition because of their fermentative and probiotic properties. Lactobacillus salivarius, a species commonly isolated from the gastrointestinal tract of humans and...

متن کامل

A Phylogenomic Study of the Genus Alphavirus Employing Whole Genome Comparison

The phylogenetics of the genus Alphavirus have historically been characterized using partial gene, single gene or partial proteomic data. We have mined cDNA and amino acid sequences from GenBank for all fully sequenced and some partially sequenced alphaviruses and generated phylogenomic analyses of the genus Alphavirus genus, employing capsid encoding structural regions, non-structural coding r...

متن کامل

Assessing the Genotypic Differences between Strains of Corynebacterium pseudotuberculosis biovar equi through Comparative Genomics

Seven genomes of Corynebacterium pseudotuberculosis biovar equi were sequenced on the Ion Torrent PGM platform, generating high-quality scaffolds over 2.35 Mbp. This bacterium is the causative agent of disease known as "pigeon fever" which commonly affects horses worldwide. The pangenome of biovar equi was calculated and two phylogenomic approaches were used to identify clustering patterns with...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Bioinformatics

دوره 29 7  شماره 

صفحات  -

تاریخ انتشار 2013